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We present three algorithms for calculating rate constants and sampling transition paths for rare 
events in simulations with stochastic dynamics. The methods do not require a priori knowledge of 
the phase space density and are suitable for equilibrium or non-equilibrium systems in stationary 
state. All the methods use a series of interfaces in phase space, between the initial and final 
states, to generate transition paths as chains of connected partial paths, in a ratchet-like manner. 
No assumptions are made about the distribution of paths at the interfaces. The three methods 
differ in the way that the transition path ensemble is generated. We apply the algorithms to kinetic 
Monte Carlo simulations of a genetic switch and to Langevin Dynamics simulations of intermittently 
driven polymer translocation through a pore. We find that the three methods are all of comparable 
efficiency, and that all the methods are much more efficient than brute force simulation. 



INTRODUCTION 

Rare events are fluctuation-driven processes which oc- 
cur infrequently. Many natural processes can be classi- 
fied as rare events, including the nucleation of crystals or 
protein aggregates, chemical reactions, earthquakes and 
some meteorological phenomena. For these processes, the 
average waiting time between events is orders of magni- 
tude longer than the timescale of the event itself. In this 
situation, conventional "brute force" simulation is highly 
inefficient. This is because few, if any, events are likely 
to happen in the accessible simulation time, and the ma- 
jority of the computational effort is spent in simulating 
the uneventful waiting time. Simulation of rare events re- 
quires the use of specialized techniques, such as Bennett- 
Chandler methods P, or Transition Path Sampling 
0, 0, . Such techniques have been extensively used for 
problems including crystal nucleation, membrane perme- 
ation, ion transfer reactions and peptide folding. How- 
ever, these methods require knowledge of the phase space 
density in the initial state and as a result they are only 
suitable for (possibly metastable) equilibrium systems. 
By "equilibrium" , we mean systems where detailed bal- 
ance is satisfied and the phase space density is known. 
For example, in equilibrium at constant particle number, 
volume and temperature (NVT), the phase space density 
follows the Boltzmann distribution. For non-equilibrium 
systems in steady state - i.e. systems in which there 
are, on average, probability currents in phase space - the 
steady-state phase space density is generally not known 
a priori. Consequently, "conventional" rare event tech- 
niques cannot be used for non-equilibrium systems. In 
this paper, we present several techniques that do not re- 
quire knowledge of the phase space density and are there- 
fore suitable for rare events in steady-state systems in or 
out of equilibrium. 

Rare events in non-equilibrium systems constitute a 
host of important problems that have thus far been gen- 
erally inaccessible to simulations. Examples include crys- 



tal nucleation under shear, polymer conformational tran- 
sitions in hydrodynamic flows, driven transport through 
membranes and most rare events in biological systems. 
To our knowledge, the only scheme to have been pro- 
posed for obtaining transition paths for rare events out 
of equilibrium in stochastic dynamical systems is that of 
Crooks and Chandler 0. Here, transition trajectories 
(paths) connecting the initial and final states are char- 
acterized by their random number history. New transi- 
tion paths are generated by making changes in the ran- 
dom number history of previously generated paths. This 
method requires that paths do not diverge significantly 
upon changing the random number history; for high di- 
mensional systems, the Lyapunov instability is likely to 
lead to inefficiency. 

For equilibrium systems, a variety of rare event tech- 
niques exist. Some of these, for example Bennet- 
Chandler methods |3| , involve the calculation of the 
free energy along a pre-determined reaction co-ordinate. 
These methods do not generate transition trajectories 
and moreover, choice of an inappropriate co-ordinate 
leads to inefficient calculation of the rate constant. Other 
methods, such as Transition Path Samphng [^0|^ and 
Transition Interface Sampling [3i > not require the 
specification of a reaction co-ordinate and do generate 
transition paths. These methods require that the transi- 
tion occurs very rapidly, since new paths are generated 
by a shooting procedure and tend to diverge, leading to 
inefficiency. String methods ^3 have also been devel- 
oped, but have not yet been implemented for large sys- 
tems. Finally, several methods, such as Milestoning [llj 
and Partial Path Transition Interface Sampling use a 
series of interfaces in phase space, like the methods to be 
discussed here. However, these techniques assume that 
the distribution of transition paths at the interfaces fol- 
lows the equilibrium distribution: an assumption which 
is unlikely to be justified in many cases [l^ . 

In this paper, we discuss several alternative schemes 
for calculating rates and obtaining transition paths in 
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stochastic dynamical systems. As well as enabling the 
efficient simulation of rare events in non-equilibrium 
steady-state systems, the methods also avoid many of 
the difficulties associated with existing equilibrium rare 
event methods. The methods do not require the spec- 
ification of a reaction co-ordinate, and transition paths 
are generated without any requirement on their length 
(since paths are generated by a ratchet-like procedure 
and not by shooting from previous paths). Furthermore, 
although a series of interfaces in phase space is used, no 
assumptions are made about the distribution of paths at 
the interfaces. The first method. Forward Flux Sampling 
(FFS), was presented in an earlier publication 

After an introduction to the theory, we give a detailed 
description of FFS (Section^, and also of two more path 
sampling schemes, the "Branched Growth" method (Sec- 
tionj and the "Rosenbluth" method (Section^). The lat- 
ter methods have been developed in analogy to efficient 
schemes for sampling polymer chain configurations. The 
BG method also resembles a technique used for comput- 
ing rare event probabilities in the field of telecommunica- 
tions . In Section„ we discuss a "pruning" method for 
increasing the efficiency of the path sampling schemes. 
All three schemes are then demonstrated for two very 
different systems: in Section „ the flipping of a genetic 
switch is modelled using kinetic Monte Garlo simula- 
tions and in Section „ we apply the methods to Langevin 
Dynamics simulations of driven polymer translocation 
through a pore. We discuss the methods in the context of 
other rare event techniques, and assess their advantages 
and disadvantages in Section^ Finally, Appendices „□ and 
□contain theoretical justifications of the algorithms, an al- 
ternative reweighting scheme for the Rosenbluth method 
and a detailed discussion of the pruning scheme. 



THEORETICAL BACKGROUND 

We assume that the rare event can be viewed as a 
spontaneous transition between two well-defined regions 
of phase space A and B; by "phase space", we mean 
the set of all parameters that characterize the system. 
We are interested in calculating the rate constant kAB- 
the average rate of transitions from A to B. We use 
the "effectivepositive flux" expression described by Van 
Erp 0, H, ^3 ■ ^ ^^'^ ^ defined in terms of a 

parameter A(a;), such that A < Aa in A and A > As in _B. 
Here, x denotes the co-ordinates of the phase space. A 
series of non-intersecting surfaces in phase space Aq . . . A„ 
are chosen, such that Aq > Aa, A„ = As and A^ > Ai_i. 
These must be chosen such that any path from A to B 
passes through each surface in turn, not reaching A^+i 
before it has crossed A^. This is illustrated in Figure ^ 
Please note the change in notation in the numbering of 
the interfaces, compared to our earlier paper [3 | . 

Defining the history-dependent functions and hjs 




FIG. 1: Schematic illustration of the definition of regions A 
and B and the interfaces Aq . . . A„ (Here, n — 3). Three 
transition paths are shown. 



such that = 1 and /ig = if the system was more 
recently in A than in B, and /i^ = and /ig = 1 other- 
wise, the rate constant kAB for transitions from A to B 
is given by 



kAB = 



P(A„|Ao). 



(1) 



Here, ^A.j is the flux of trajectories with Ha — 1 (i.e. 
coming from A), that cross A^ for the first time: thus 
^A,n is the flux of trajectories reaching B from A and 
^ Afi is the flux reaching the first interface Ao from A. 
The overbar denotes a time average, and the factor Iia is 
the average fraction of the time that the system spends in 
the "basin of attraction" oi A. P(A„|Ao) is the probabil- 
ity that a trajectory that reaches Aq subsequently arrives 
in B instead of returning to A. Eq.(^ states that the 
total flux from A to B is the flux of trajectories from 
A to Ao, multiplied by the probability that such a tra- 
jectory will later reach B. In this way, the problem of 
calculating the very small flux '^A,n is reduced to a cal- 
culation of a larger flux ^ a.Qi and a small probability 
-P(A„|Ao). P(A„|Ao) can then be expressed as the prod- 
uct of the probabilities P(Ai+i|Ai) that a trajectory that 
comes from A and crosses \i for the first time, will sub- 
sequently reach Ai+i instead of returning to A: 



p(A„iAo)- n^(^'+ii^») 



(2) 



It is important to point out that Eq.|21) docs not im- 
ply an assumption that the system is Markovian. This 
is because the conditional probabilities P(Ai+i|Ai) are 
implicitly weighted over the ensemble of paths reaching 
\i from A, as shown in Appendix g The discussion in 
Appendix b also shows the equivalence of the averaging 
procedures used to evaluate P(A„|Ao) in the three path 
sampling methods described in this paper. 
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The methods described in this paper allow one to sam- 
ple the Transition Path Ensemble (TPE), as well as cal- 
culating the rate constant. The TPE is the collection 
of all transition trajectories (paths) from A to B that 
would be obtained if an infinitely long brute force sim- 
ulation were to be performed. Analysis of the TPE can 
lead to a mechanistic understanding of the rare event in 
question through, for example, the calculation of com- 
mittor distributions 0]. We shall see in Sections and „ 
as well as Appendix „ that the three methods discussed 
here generate transition paths belonging to the TPE in 
an efficient manner and with the correct weights. 

The starting point for all three methods is the choice 
of the parameter A(a;) and the definition of phase space 
regions A and B. X{x) must increase monotonically as 
the interfaces Aq . . . A„ are crossed. However, there is no 
assumption that A is the reaction co-ordinate: transition 
paths are free to follow any possible path between A and 
B, including paths which "loop back", crossing some in- 
terfaces several times. A good choice of A will improve 
the efficiency of the calculation but will not affect the 
final rate constant or transition paths. In fact, for the 
test systems of Sections „ and „ our choice of A is very 
simple and is unlikely to correspond to the true reaction 
co-ordinate. The interfaces Ao . . . A„ are placed between 
A and B, with A„ = Xb- We find that it is often conve- 
nient (although not necessary) to place Aq at the border 
of the A region: Aq = Xa- The optimum number and 
placement of the interfaces will be discussed in detail in 
a future publication 



ALGORITHMS 
The Forward Flux Method 

The first of our three methods is Forward Flux Sam- 
pling (FFS), which was introduced in an earher paper 
For clarity, we describe the method again here, to- 
gether with some recent improvements. The FFS algo- 
rithm begins with a simulation run in the basin of attrac- 
tion of region A. The parameter A is monitored during 
this run. Each time the trajectory leaves A and reaches 
Ao for the first time since leaving A, a counter q is incre- 
mented and the phase space co-ordinates of the system 
are stored. The run is then continued until A^o points 
at Ao have been collected, as illustrated in Figure 
If, during this run, the system happens to enter B, it 
is replaced in A and re-equilibrated. The number Nq of 
collected points should be as large as possible, in order 
to obtain good sampling of the transition paths. As dis- 
cussed in Section B if Nq is so small that it is similar to 
the number n of interfaces, sampling problems will occur 
due to "genetic drift" . In the examples of Sections „ and 
B A^o is of the order of thousands. The flux ^A,o/h_A_ is 
given by ^A,o/hA = q/T, where T is the total length of 



the simulation run. Figure[5^ shows this first stage of the 
algorithm: crossings of Aq that are labeled with a black 
circle contribute to q and to the collection of points at 
Ao. In practice, it may be convenient not to store the 
co-ordinates of every "black circle" , but rather to select 
crossings in some unbiased way. We note that q must be 
incremented for every "black circle" crossing. 
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FIG. 2: The first (a) and second (b) stages of the FFS method. 
The distribution of points at the interfaces depends on the 
history of the paths, as illustrated by the dashed lines in (b). 
The circles are members of the collection of points at the 
interfaces Xi. 



In the next stage of the algorithm, we estimate the 
probability P(Ai4-i|Ai) of reaching A^+i from A;, instead 
of returning to A. Starting with the collection of A'o 
points at Ao, Mq trial runs are carried out. Each trial run 
consists of selecting a point from the collection at random 
and using it as the starting point for a simulation run 
which is continued until either Ai or Xa is reached. If the 
trial run reaches Ai then a counter Ns is incremented, 
and the final point of the run is stored in a new collection 
of points at Ai. After the Mq trial runs, we are left with 
an estimate for P(Ai|Ao) = Ng^'' /Mq, and a collection of 
Ai — A^i"-* points at Ai. Using this collection of points. 
Ml trial runs arc then carried out, each time selecting a 
starting point at random and running a simulation until 
either A2 or A^i is reached. This procedure is repeated 
for each interface A^, each time using the collection of 
points generated at the previous interface and firing trial 
runs as far as A^+i or A^i. A possible way of improving 
efficiency by eliminating long paths back to A will be 
discussed in sectiounand Appendix„ The end result of the 
trial run procedure is an estimated value of P(Ai+i|Ai) = 
Ns'^'^ /Mi, for each interface i. Multiplying these together 
as in Eq.ll^J leads to an estimate for P(A„|Ao). This can 
then be multiplied by the fiux ^A.a/hA calculated in the 
first stage, to give the rate constant kAB- 

The method described here is sl ight ly different from 
that outlined in our earlier paper [lj|. The collection 
of points at Aj+i now consists of the end points of all 
the Nl successful trajectories from A,; previously, only 
a user-defined number of points was stored and the rest 
were discarded. Storing a larger number of points at the 
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interfaces leads to better sampling, with no increase in 
cost. In addition, the only user-defined parameters are 
now the number A'o of initial points and the numbers 
Mi of firing runs to be carried out at each interface (as 
well as the number and position of the interfaces). It 
is important to ensure that the Mi are large enough to 
generate sufficient points at the next interface for good 
sampling. In our earlier paper, we also described a pro- 
cedure whereby a series of "sub-interfaces" between each 
pair of interface Xi and A^+i were used to construct his- 
tograms for P(A|Ai), which were then fitted together to 
obtain P(A„|Ao). The aim of this approach was to reduce 
the accumulation of errors caused by the multiplication 
of conditional probabilities in Eq.©- While this fitting 
procedure gives the correct rate constant, we find that in 
practice it does not improve the efficiency of the method 
and we do not therefore use it. 



A method similar to FFS has recently been used to 
study the crystal nucleation of sodium chloride [l7|. 



The Branched Growth Method 

We now describe an alternative path sampling and rate 
constant calculation scheme: the "Branched Growth" 
(BG) method. Both the BG method and the Rosenbluth 
scheme of Section g are similar to techniques originally 
developed for the efficient sampling of polymer configu- 
rations 0]; an analogy is used between transition paths 
and conformations of a polymer chain, with partial paths 
between interfaces playing the role of polymer segments. 
The BG method also resembles a technique that is used 
to compute probabilities of rare events in telecommuni- 
cation systems [Tsf . 
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FIG. 3: Schematic illustration of the extraction of the transi- 
tion path ensemble from the FFS procedure. All partial paths 
that reach the subsequent interface are shown. Partial paths 
that do not contribute to the TPE are shown by dotted hues. 
The solid lines correspond to the TPE; the width of the line 
indicates the weight of the contribution of a particular partial 
path to the TPE. 
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FIG. 4: A schematic view of the generation of a branched path 
(thick lines) using the Branched Growth Sampling method. 
The simulation run in the A basin is shown by a dotted line. 
Trial runs which fail to reach Ai+i are shown by thin lines. 
The generation of the initial point for the next path is also 
shown. 



The FFS method generates transition paths according 
to their correct weights in the Transition Path Ensemble 
(TPE), as shown in Appendix □ In order to extract these 
paths, the phase space co-ordinates of the system must 
be stored for all points along all trial runs which success- 
fully reach A^+i from Ai. One must also store information 
on the connectivity of the partial paths; i.e. each suc- 
cessful trial from A^ to Ai+i is annotated with an index 
that describes its initial point at A^. Once the trial run 
procedure is complete, transition paths are obtained by 
following the trials that reach B back to A„_i, following 
their initial points back to Xn-2, and so on back to A. As 
illustrated in Figure 13 this results in a "branching tree" 
of transition paths, in which partial paths close to A may 
be shared by many members of the TPE. The resolution 
in phase space of the TPE is therefore better for phase 
space regions close to B than for those close to A; the 
TPE that is produced is nevertheless correctly weighted. 



The BG method begins with a simulation in the basin 
of attraction of A, which is suspended when the system 
leaves A and crosses Aq. The resulting system config- 
uration at Ao is then used as the starting point for 
trial runs, which are continued until either Ai or Xa is 
reached. Each trial run is assigned a "weight" 1/fco. If 
Ns^^ > of the trials reach Ai, then each of the ivi"' 
end points at Ai becomes a starting point for fci new 
trial runs, which have weight l/(fcofci), and which are 
continued until A2 or Xa is reached. Each of the TVi^' 
total successful trials from Ai to A2 generates a start- 
ing point for k2 trial runs to A3, with weight I / (kQkik2) , 
and so on until A„ = Xb is reached. Once the genera- 
tion of one branching path is over, either because B was 
reached, or because no successful trials were generated 
at some intermediate interface A^ , we obtain an estimate 
of P(A„|Ao) from the total weight of the branches that 
eventually reaches A„: P(A„|Ao) = Afi"~^V Iir^o^ 
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order to begin the generation of the next branching path, 
the simulation run in the A basin is resumed and a new 
starting point at Aq is generated the next time the system 
crosses Aq, coming from A (the system must return to 
A between subsequent starting point generations). The 
same trial run procedure is then used to create a "branch- 
ing tree" of paths from this starting point, resulting in 
a new estimate of P(A„|Ao). After many such branching 
paths have been generated, the final estimate of P(A„|Ao) 
is simply the average of the contributions due to all the 
paths (including those which failed to reach B: these 
make a zero contribution). The flux $a,o/^X can be ob- 
tained from the total number of crossings observed in the 
simulation in the A basin, divided by the total length of 
this run. The "branching trees" of paths connecting re- 
gions A and B which arise from this sampling method 
are correctly weighted members of the TPE, as shown 
in Appendix „ By storing the phase space co-ordinates 
of the points in the trial runs which successfully reach 
Ai+i, one can obtain branching transition paths, which 
can be analyzed to obtain information on the mechanism 
by which the rare event occurs. 

The Branched Growth method is illustrated schemat- 
ically in Figure^ As in FFS, the TPE is sampled with 
better resolution in the phase space region close to B, 
since the transition paths are branched. 

The Rosenbluth Method 

Our third scheme, the "Rosenbluth" (RB) method, is 
related to the Rosenbluth scheme for sampling polymer 
chain conformations P, 0, . As for the BG method, 
transition paths are generated one at a time. In contrast 
to BG, however, the RB method generates unbranched 
paths. This means that the TPE is sampled evenly for all 
values of A and also makes the extraction and analysis of 
transition paths very easy. Furthermore, the RB method 
requires less storage of system configurations, which may 
be useful for large systems. 

For the FFS and BG methods, we show in Appendix 
□ that the TPE is automatically generated with the cor- 
rect path weights. However, as we shall see, in the RB 
method, when paths are initially generated they do not 
have the correct weights. A "reweighting" procedure is 
therefore needed in order to correctly sample the TPE. 
Here, we describe a Metropolis-type acceptance/rejection 
reweighting procedure in Appendix „ we also discuss 
an alternative "Waste-Recycling" scheme based on the 
recent work of Frenkel 0| . 

The generation of a transition path in the RB method 
takes place as illustrated in Figure El We begin with an 
initial point at Aq, which is obtained in the same way 
as for the FFS and BG methods, using a simulation run 
in the basin of attraction of A. The point at Aq is used 
to initiate ko trial runs, which are continued until they 




FIG. 5: A schematic view of the generation of a transition 
path using the Rosenbluth Sampling method. The simulation 
run in the A basin is shown by a dotted line. The transition 
path is shown by bold lines. Trial runs which do not form part 
of the transition path are shown by thin lines. The generation 
of the next starting point at Ao is also illustrated. 



reach either Ai or Xa- If at least one of these successfully 
reaches Ai, we choose one successful trial at random, and 
use its end point at Ai as the starting point for a set of ki 
trial runs, which end either at A2 or at Xa- Once again, a 
successful trial is chosen at random and used to continue 
the path. This procedure is repeated until either B is 
reached, or no successful trials are produced. 




FIG. 6: Two transition paths generated by the Rosenbluth 
method. The bottom path must be reweighted by a factor of 
9 relative to the top path. 

The RB method generates unbranched transition 
paths, in contrast to the FFS and BG methods. For 
FFS and BG, paths for which more trial runs are suc- 
cessful produce more branches and make a greater con- 
tribution to the TPE, resulting in "automatic" correct 
weighting of the transition paths. In the RB method, 
however, one successful trial at each interface is chosen, 
regardless of how many successful trials there were. This 
leads to paths being generated with incorrect weights: 
as illustrated schematically in Figure El paths for which 
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more trials were successful must be given an increased 
weight in the TPE relative to those for which fewer trials 
were successful. We show in Appendix „ that the weight 
of each generated transition path must be multiplied by 
the "Rosenbluth factor" W, which is given by: 

n-l 

W^Yl iVi^) (3) 

In the illustration of Figure (HJ = 2 for the top path 
and W — 18 for the bottom path. W in fact corresponds 
to the number of branches that would have been present, 
had we been using the BG scheme. As well as the gener- 
ation of paths, we shall also discuss the computation of 
the probabilities P(Ai+i|Ai). For this, it is important to 
note that the weighting factor for an incomplete transi- 
tion path - i.e. one that connects ^ to Ai, is 

W^^ll TVi^") (4) 

3=0 

We now describe a practical scheme for sampling 
correctly weighted transition paths and for calculating 
the probabilities P(Ai+i|Ai). The scheme (which we 
denote RB/M) uses a Metropolis acceptance/rejection 
reweighting procedure flj. An alternative "Waste- 
Recycling" approach (denoted RB/WR) is discussed in 
Appendix □ The algorithm is as follows: 

(1) Define values wj;"^ and Wj*"""*. For each interface 
< i < n, define values rrii — 0, w/°\ w/"\ p'f\ p[^\ 
pcum _ Q Define arrays of system configurations P*^"^ 
and V^"^ in which to store transition paths. 

(2) Begin or continue a simulation run in the A basin. 
When the system leaves A and crosses Aq, suspend this 
run. Denote the system configuration as xq. Set i = 0. 

(3) Increment mo — > toq -t- 1. Initiate fcp trial runs from 
xq. Continue each trial run until either Ai or A^i is 
reached. Calculate the number ivj*''' of trials which reach 
Ai. Increment pg™ ^ pg"" A^i^V^o- Set 1^/"^ = A^i"^ 
and 1^1^"^ = ivf \ 

(4) If > 0, choose one successful trial at random. 
Denote the final point of this trial as xi. Add the 
configurations corresponding to this trial run into the 
array 'P^'^\ Set i = 1. Otherwise (if Ai°^ = 0), return to 
step (2). 

(5) Increment mi to; + 1. Initiate ki trial runs from 
Xi. Continue each trial run until either Ai+i or Xa 

(i) 

is reached. Calculate the number Ns of trials which 
reach A^+i. Set = N^'^/h. If N^'^ > 0, select one 
successful trial at random and denote the final point of 
this trial as x^+i. 

(6) If nii = 1, set p[°^ = pf^ and W^°^ = 11^^"^ 
If TTii > 1, draw a random number < r < 1. If 
r < Wj;''^/Wt\ set p^^ = p^^ and wj;"^ = \ If 



remain unchanged. 

(7) Increment p'?™^ pCum _|_ p(o) Increment 

^ * ivi'\ Set W^ll = Wf * N^^K 

(8) If A^i*'' = 0, return to step (2). Otherwise, increment 
i i + I and repeat steps (5)- (8) until i = n. 

(9) If i ^ n: if m„ = 1, set W^^ = wj'''^ and 
-pio) ^ p(n) Otherwise (if iTT-n > 0), draw a random 
number < r < 1. If r < Vr/"Vw^i°\ set = W^'^'' 
and -p(°) = P("). If r > W^""^ /Wt^°\ W^°'' and V^°'> 
remain unchanged. 

(10) The path P^") is a member of the TPE and should 
be included in any analysis of the transition mechanism. 

(11) Repeat steps (2)-(10) many times. 

(12) For each interface < i < n, calculate 
P(A,+i|A,) = p^'^/m,. The flux $a,o is given by 
nio/T where T is the total length of the simulation run 
in the A basin. 

In this scheme, transition paths are generated by 
shooting ki trials from each interface i and selecting one 
successful trial at random, rrii denotes the number of 
paths to interface i that have been generated. When 
a complete path from ^ to i? has been generated, its 
Rosenbluth weig given by Eq.lO, is compared 

to the Rosenbluth weight wj°'' of the previous complete 
path to be accepted (step (9)). The newly generated path 
is accepted if W^'^^ /W^"'' > r where r is a random number 
between and 1 (unless it is the first path to be gener- 
ated (m„ = 1), in which case it is always accepted). This 
Metropolis scheme has the effect of reweighting transition 
paths according to their Rosenbluth factors. The scheme 
also incorporates Metropolis acceptance/rejection steps 
at every interface, in step (6). This is necessary for cor- 
rect calculation of the probabilities P{Xi+i\Xi), since the 
probability estimate pi — Ns^'^ /ki obtained by firing ki 
trial runs from interface A^ must be reweighted by a fac- 
tor Wi given by Eq. 10} which depends on the partial path 
leading from ^ to Ai . We note that the generation of the 
transition path continues to interface i + 1 regardless of 
the outcome of the acceptance/rejection step at interface 
i. We also note that the "previous partial path to be 
accepted" at interface i need not have any segments in 
common with the "previous partial path to be accepted" 
at any other interfaces. 

In Appendix □ we demonstrate that this scheme indeed 
samples the TPE correctly, and we discuss the differences 
between this approach and the Rosenbluth method for 
the sampling of polymer configurations. 



Pruning 

The analogy with polymer simulations also suggests a 
possible improvement to the efficiency of all three meth- 
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ods. By "pruning" trial paths which go in the direction of 
region A, we can avoid the computational expense of inte- 
grating "failed" trials from interface i all the way back to 
A^. In analogy with the Pruning method used for poly- 
mers trial paths which reach the previous interface 
at Aj_i from are terminated with a certain probabil- 
ity. Surviving trials must be re-weighted to preserve the 
correct weights in the final TPE. The implementation of 
pruning in the context of these path sampling methods is 
described in Appendix □ for the genetic switch and poly- 
mer translocation problems described here, we find that 
although the results for kAB continue to be correct when 
pruning is used, no significant improvement in efficiency 
is achieved. 



APPLICATIONS 
Genetic switch 

We now move on to a demonstration of the methods 
of Sections and □ for several non-equilibrium rare event 
problems. Our first test system is a set of chemical re- 
actions comprising a symmetric bistable genetic switch, 
which is simulated using a kinetic Monte Carlo scheme. 
This is a non-equilibrium system whose dynamics does 
not obey detailed balance 0, Examples of 

real genetic switches include the lysis-lysogeny switch of 
phage A and the lac system of E. coli (i^l , as well as 
artificially engineered bacterial genetic switches [j^ . 
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FIG. 7: Reaction scheme for the genetic switch. Proteins A 
and B can dimerize and bind to the DNA at the operator site, 
O. When A2 is bound to O, B is not produced, and when B2 
is bound to O, A is not produced. Both proteins are degraded 
in the monomer form. Forward and backward rate constants 
kf and kt are identical for A and B. 

Our model genetic switch is shown in Fig. 13 The 
switch consists of a piece of DNA containing two genes 
A and B, as well as a controlling operator site, O. The 
two genes encode proteins A and B, and we assume that 
(when O is unoccupied) each of these proteins is produced 
from the DNA in a one-step process with rate constant 
k. In nature, of course, protein production is a complex 



multi-step process, the details of which we ignore. Both 
proteins can dimerize and their dimers A2 and B2 can 
bind to the operator site - however, only one dimer can 
be bound at any time. The binding of dimers to the 
operator site has the effect of controlling protein produc- 
tion - when A2 is bound to O, A is produced at rate 
fc, but B is not produced. Likewise, when B2 is bound 
to O, B is produced at rate fc, but A is not produced. 
Each dimer therefore blocks the production of the other 
protein. Both proteins are also removed (by enzymatic 
degradation or dilution due to cell growth) at a constant 
rate of 0.25k. The genetic switch is bistable, having two 
steady states, one with a large number of A molecules, 
and few B, and the other with a large number of B and 
few A molecules. Switching between these states ("flip- 
ping") occurs due to stochastic fluctuations; the factors 
affecting the flipping rate have been extensively investi- 
gated [Hllliilia. 

We simulate the switch using the Gillespie algorithm 
[sfij i. This is a widely-used kinetic Monte Carlo scheme 
for propagating chemical reactions. In each simulation 
step, a random number is drawn from the correct (expo- 
nential) distribution and used to choose the time at which 
the next reaction will occur, and another random num- 
ber is used to determine which reaction this will be. The 
simulation time and numbers of molecules of all species 
are then updated accordingly. The phase space in these 
simulations is the number of molecules of each chemical 
species present in the system. A full description of the 
simulation algorithm is given in Ref. |30|. An initial 
version of the results for the FFS method, as well as a 
discussion of some interesting features of the TPE, was 
given in a previous publication [l^ . 
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FIG. 8: A as a function of time (in units of A; ^) for a typical 
simulation run. 



For the parameter A, we choose A = A'^b — Na, where 
A'a is the total number of A proteins, and Nb the total 
number of B proteins: 



A"a = + 2 {nA2 + "-0A2 ) 

A'"b = ?1b + 2 (TT-Ba + rioBa ) 



(5) 
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nx being the number of molecules of species X . Figure 
|S1 shows A as a function of time for a typical brute- force 
simulation run (note that the unit of time in these sim- 
ulations is fc^^). It is clear that the system is indeed 
bistable, and that transitions are rapid in comparison 
to the waiting time between events. The parameters of 
FigElhave been chosen to give a rather fast flipping rate, 
which can be measured using brute force simulation. We 
define regions A and B hy Xa = = —24 and Xb ~ 24. 
A "flip" is considered to have occurred when the sys- 
tem enters region B, having come from A (i.e. having 
= 1, meaning it was in A more recently than in _B). 
To calculate kAB, the integral F{t) — j^dt'p{t') of the 
distribution p{t) of times between flips was fitted to the 
Poisson function F{t) — 1 — exp [— fc^si], leading to a re- 
sult kAB = (9.4±0.2) X lO^'^A:. This calculation was done 
over a total brute force simulation time of 9 x lO^fc"^, 
during which 8808 flips occurred. 



f/k X 10"^ Pb X 10"^ 

BF 

FFS 1.221 ±0.005 7.8 ±0.1 
BG 1.212 ±0.006 7.6 ±0.2 
RB/M 1.220 ±0.004 7.8 ±0.1 
RB/WR 1.223 ±0.004 7.7 ±0.2 



kAs/k X lO"'^ N,t X 10^ 



9.4 ±0.2 
9.4 ±0.2 

9.3 ±0.2 

9.4 ±0.1 
9.4 ±0.3 



14.8 
1.1 
0.5 
1.8 
1.0 



TABLE I: Path sampling and brute force results for / — 
^A,o/hA, P(A„|Ao) and kAB- The brute force result is ob- 
tained by fitting F{t) as described in the text. Nst is the ap- 
proximate number of simulation steps performed in obtaining 
the result in the table. 

The results of calculations of the flipping rate us- 
ing FFS, BG, RB with Metropolis acceptance/rejection 
(RB/M) and RB with Waste Recycling (RB/WR), for 
the same parameter set, are shown in Table ^ In all 
cases, Xa — Xo = —24 and As = A„ = 24. We 
used 71 = 7 and the interfaces were placed at A^ — 
{-24, -22, -18, -15, -12, -9, -4, 24} (0 < i < n). The 
number of trials k for the RB and BG methods was 
ki = {6,5,4,4,5,5,4} {0 < i < n). For FFS, we chose 
A^o = 1000 and the number M of trials at each inter- 
face was M, = {6000,5000,4000,4000,5000,5000,4000} 
(0 < I < 7i). The calculations were carried out as 
a series of "blocks", each consisting of 1000 starting 
points for the RB and BG methods, and of one FFS 
run for FFS. Results were averaged over all blocks. Ta- 
ble ^ shows excellent agreement with the brute force 
result for all the path sampling methods. The val- 
ues of P(Ai+i|Ai) were found to be: P{Xi+i\Xi) = 
{0.25,0.20,0.30,0.26,0.24,0.24,0.34} for < i < n. The 
approximate number of simulation steps performed in 
obtaining the result in Table ^ is also given: it is clear 
that all the path sampling methods are much more effi- 
cient than brute force simulation, even for this relati vely 
rapidly flipping switch. In a previous publication |14| . 
it was demonstrated that the improvement in efficiency 



of FFS over brute force simulation was dramatically in- 
creased when the switch flipping rate was decreased. In 
this work, we have not attempted to optimize the num- 
ber and positioning of interfaces, or the number of trials 
carried out at each interface. Table HI does not, therefore, 
provide a reliable guide to the relative efficiency of the 
various path sampling methods. However, we can make 
the general observation that the compuational efficiency 
is of the same order of magnitude for all of the meth- 
ods. This issue will be discussed in detail in a future 
publication [l^ . 



Driven Polymer Translocation through a pore 

Our second test system represents a simplified ap- 
proach to the important problem of polymer transport 
through a nanopore. This is a widely occurring phe- 
nomenon: biological examples include protein transloca- 
tion through pores, RNA transport across the nuclear 
membrane, and injection of genetic material by viruses, 
while technological applications include gene therapy and 
sequencing of DNA. In general, translocation does not oc- 
cur in equilibrium, but in response to a driving force, such 
an electrical field or the action of motor proteins. An im- 
portant difference to the genetic switch flipping of Section 
□ is that this is not a bistable system but rather an escape 
problem: translocation events occur only in one direction 
and after each event the system is re-equilibrated in its 
original configuration. 
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FIG. 9: (a) An illustration of the polymer simulation, (b) A 
"zoomed in" view, showing the three regions used to define 
A, in Eq. CT . 



We have applied the path sampling methods of Sec- 
tion B B ^-iid B to Langevin Dynamics simulations of the 
non-equilibrium, unidirectional translocation of a poly- 
mer through a pore. The simulation setup is shown 
schematically in Figure El Our model polymer consists 
of A^ monomers, each of which interacts with all other 
monomers via a spherical Lennard- Jones potential with 
parameters e and a: 



vij{r) = 4e 



12 



crx6 

r 



(6) 
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where r is the distance between the monomers. This 
interaction is truncated at r = 3(t. Each monomer also 
interacts with its neighbours along the chain via a linear 
spring potential (spring constant Sg) of the form: 



and 



Vs{r) = Ss{r - ro)^ 



(7) 



Here, r is the distance between two neighbouring 
monomers and rp is the bond length. The pore, of radius 
i?, embedded in a slab of width L, is modelled by a re- 
pulsive Lennard-Jones potential with parameters e^, and 



(8) 



where r is now the shortest distance between a monomer 
and the wall of the pore or slab. This interaction is also 
truncated at r = Scr. 

We do not aim at present to undertake a detailed study 
of the mechanism of polymer translocation. We therefore 
neglect the process by which the polymer arrives at the 
pore mouth, constraining the first monomer in the chain 
not to move far from the pore entrance on the left-hand 
side. This is achieved by applying a harmonic restraining 
force (spring constant Shr) to the first monomer, of the 
form: 



fhrin) 



-Shrin 

= 



R) ri> R 
otherwise 



(9) 



In Eq.(|5Jl, ri is the distance of the first monomer from 
the point (— L/2,0, 0). The force acts along the vector 
connecting the first monomer to this point. If the first 
monomer is within a hemisphere of radius R around the 
pore mouth, or is inside the pore, or beyond the pore on 
the right-hand side, the restraining force is zero. 

To model the pulling of the polymer through the pore, 
we suppose that there exists some mechanism which ex- 
erts force on any monomers which are inside the pore. 
This force is, however, intermittent in time: the pore 
fiips between states ON and OFF at rates fci (for the off 
on transition) and fc_i (for the on — > off transition). 
When the pore is in the ON state, all monomers inside 
the pore experience a force fpuu in the positive x direc- 
tion. When the pore is in the OFF state, no pulling force 
is exerted. Although this model is not meant to repre- 
sent any particular system, intermittent pulling forces of 
this kind might be produced by motor proteins localized 
inside pores. The intermittent pulling force makes this a 
non-equilibrium system. 

The monomers also experience stochastic forces due to 
the effects of solvent, and their dynamics is simulated 
according to the usual Langevin Dynamics algorithm: 



Pia{t) 



m 



(10) 



Piait) = -£.Pia{t) + fia{t) + Pia{t) 



(11) 



where is the i-th component of the position vector of 
monomer i, pia is the i-th component of its momentum 
vector and fia is the i-th component of the force acting on 
it due to the other monomers, the interactions with the 
wall, the pulling force and (for the first monomer only) 
the constraint force. The parameter m is the monomer 
mass, ^ is the friction constant, related to the diffusion 
constant D hy ^ = ks/mD, and pia is a "random force" 
representing collisions with the solvent molecules and sat- 
isfying 



{Pia{t)pil3{0)) = 2mkBT^S{t)Sal3 



(12) 



Equations (|10|l and Hll|) are integrated with a finite 
timestep dt using the predictor-corrector- type algorithm 
given in the book of Allen and Tildesley |3l|. If, at a 
particular step, the state of the pulling force is OFF, it is 
changed to ON with probability kidt, and if it is ON, it is 
changed to OFF with probability k-idt. Once the entire 
polymer has passed through the pore {i.e. all monomers 
are located at a: > L/2), we replace it in its starting posi- 
tion (a pre-equilibrated configuration), re-equilibrate for 
Teq = lOOfT^/D and continue the simulation. 

The parameters of the simulation were chosen such 
that the monomers attract each other strongly and the 
polymer adopts a globular configuration before enter- 
ing the pore. To enter the pore, the polymer is forced 
to adopt an energetically unfavourable extended config- 
uration. This scenario could model protein transloca- 
tion. From the point of view of our calculations, it has 
the advantage of ensuring that the waiting time between 
translocation events is long compared to the length of the 
events themselves. The parameter values were: N = 10, 
R ^ 2a, L ^ 2(7, dt = 0.02a'^/D,ro = 0.5ct,s, = 
2kBT/a'^,e = 2.5kBT,eu} = O.^ksT^aw = Icr fhr = 
5kBT/a\fpuu = 1.0fcsr/a,fci = 10a~^D, = la-^D 
and Lx = Ly ~ Lz — 200cr. Our units of length are taken 
to be the Lennard-Jones interaction parameter ct; units 
of mass are m, units of energy are ksT and the diffu- 
sion constant defines the units of time, which are /D. 
The simulation box is cuboidal, with dimensions L^, Ly, 
Lz] periodic boundary conditions were used in all three 
dimensions. 

We define the parameter A in a rather trivial way. We 
consider three regions, illustrated in Fig. ^p: the hemi- 
spherical region of radius R around the left-hand pore 
mouth (region I), the region inside the pore (region II), 
and the region outside the pore on the right-hand side 
(region III). Taking n/, nn and nm to be the numbers 
of monomers in the three regions, we define 



A 



nj/4 + nii/2 + nm 
N 



(13) 
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During the translocation process, A increases from a value 
of A < I /{AN), to unity. We note that expression itT^ 
is chosen merely for convenience, and is not expected to 
reflect the true reaction mechanism. A simpler definition 
might be the number of monomers which have already 
translocated (A = n///) - although this would also lead 
to the correct value of kAB, in practice it gives rather 
few crossings of the first interface and is therefore less 
efficient. 
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FIG. 10: (a): A as a function of time (in units of a'^D—1) for 
a typical brute force simulation run. 



Figure EH shows A as a function of time for a typ- 
ical brute force simulation run. Translocation events 
occur rapidly compared to the waiting time between 
events. Defining an event to have occurred at the mo- 
ment that the system crosses the interface A„ ^ 1, the 
waiting time distribution p(t) can be measured and its 
integral F{t) = dt' p{t') fitted to the Poisson function 
F{t) = 1 — exp [— fcyisi], in order to measure kAB- This 
resulted in a brute force measurement kAB = (1-48 ± 
0.02) X 10~^Da^^, obtained by simulating 5912 translo- 
cation events. 



/ X 10" 



Pb X 10'^ kAB X 10~* Nst X 10** 
BF - - 1.48 ±0.02 20.0 

FFS 1.084 ± 0.006 1.36 ± 0.02 1.48 ± 0.02 5.5 
BG 1.084 ± 0.006 1.35 ±0.02 1.47 ±0.02 3.2 
RB/M 1.086 ±0.006 1.31 ±0.03 1.43 ±0.03 4.9 
RB/WR 1.092 ±0.006 1.35 ±0.03 1.47 ±0.03 4.9 



TABLE II: Path sampling and brute force results for / = 
^A,o/hA, -P(A„|Ao) and kAB- Units of / and kAB are Da~^ . 
The brute force result is obtained by fitting F{t) as described 
in the text. Errors represent the standard error in the mean 
of a series of independent estimates. Nst is the approximate 
number of simulation steps performed in arriving at the result 
given in the table. 

The FFS, BG and RB methods (using both Metropolis 
acceptance/rejection and Waste Recychng) were applied 
to the polymer translocation problem, using the defini- 
tion 1)1 3|l of A. States A and B were defined by Aa = Ao = 



0.025 and \b — A„ — 1.0. We used n — 7, with interfaces 
positioned at A^ = {0.025, 0.05, 0.1, 0.15, 0.2, 0.3, 0.5, 1.0} 
for < i < n. For the FFS method, we used No = 500 
and the number of trials M at each interface was Mi — 
{1500,1500,1000,1000,1500,1500,550} for < i < n. 
For the BG method, the number of trials per point was 
ki = {3, 3, 2, 2, 3, 3, 1}, while for both RB schemes it was 
ki = {6,6,4,4,6,6,2} (0 < i < n). In all cases, averages 
were taken over a series of "blocks", each consisting of 
500 starting points for the BG and RB methods, or of 
one FFS run. The results, given in table ITll show good 
agreement with the brute force results. The number Nst 
of simulation steps used in the calculations is significantly 
lower for the path sampling techniques. 



/ X 10"^ Pb X 10"^ 
FFS 9.88 ±0.06 3.4 ±0.1 
BG 9.70 ±0.06 3.6 ±0.1 
RB/M 9.77 ±0.03 3.8 ± 0.3 
RB/WR 9.83 ± 0.03 3.4 ± 0.2 



kAB X 10"* Nst X 10** 

3.4 ±0.1 3.6 

3.5 ±0.1 7.1 
3.7 ±0.3 20.9 
3.3 ±0.2 17.6 



TABLE III: Path sampling and brute force results for / — 
^A.o/hA, P{X„\Xo) and kAB, for the polymer translocation 
problem with the altered parameter set. Units of / and kAB 
are Da~^. Errors represent the standard error in the mean 
of a series of independent estimates. Nst is the approximate 
number of simulation steps performed in arriving at the result 
given in the table. 

The parameter set given above was designed so as 
to allow calculation of the translocation rate by brute 
force simulation, in order to test the path sampling 
methods. The methods can also be used, of course, 
for much rarer transitions where brute force simula- 
tion is not feasible. We have also carried out calcu- 
lations of kAB for an altered parameter set, which is 
as above, except that the monomer-monomer Lennard- 
Jones interaction parameter e is increased to e = SfcsT, 
and the wall-monomer interaction parameter Cw becomes 
Em = XksT . This implies very strong attraction between 
the monomers and very strong repulsion between the 
monomers and the pore. The same interfaces were used. 
For FFS, the same parameters were used: TVq = 500 
and M, = {1500,1500,1000,1000,1500,1500,550}. For 
the BG method, the number of trials per point was 
ki = {4, 5, 3, 4, 7, 10, 2}, while for both RB schemes it was 
ki — {12, 15, 9, 12, 21, 30, 6}. These parameters were cho- 
sen for convenience, but no systematic attempt was made 
at optimization; thus the results should not be used to 
compare efficiencies of the various path sampling meth- 
ods, although once again, we see that the efficiency of 
all the methods is within about the same order of mag- 
nitude, with the RB method being somewhat less effi- 
cient. The results for this rarer translocation problem 
are given in Table Hill Since the rate constant is 44 times 
smaller in this case, we can suppose that to obtain a 
brute force result of comparable accuracy, approximately 
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44 X 20 X 10 w 9 X 10 simulation steps would be re- 
quired. 

DISCUSSION 

In this paper, we have described three methods for the 
calculation of rates and the sampling of transition paths, 
for rare events in equilibrium or non-equilibrium stochas- 
tic dynamical systems in stationary state. What is the 
origin of the increased efficiency of these methods over 
brute force simulations? A general characteristic of rare 
events is that the system makes many "failed attempts" , 
in which a fluctuation drives the system in the direction 
of -B, for each "successful" transition from state A to state 
S. In a brute force simulation, one does not capitalize on 
these failed attempts, but simply waits for the rare suc- 
cessful transition. In the methods described here, once 
the system crosses a particular interface, this configura- 
tion is stored and trial runs are used to try to extend the 
path to subsequent interfaces. The interfaces thus allow 
us to capitalize on those fluctuations that drive the sys- 
tem in the direction of B, since the system advances from 
one interface to the next in a ratchet-like manner. This 
is the origin of the increase in efhciency over brute force 
simulation. Of course, situations may arise in which the 
majority of the fluctuations in the direction of B in fact 
lead into "blind alleys" , rather than generating transi- 
tion paths. This problem could perhaps be overcome by 
again exploiting the analogy with polymer simulations 
to develop a scheme based on the Recoil Growth method 

The approaches described in this paper differ greatly 
from existing path sampling methods for rare events. The 
most widely used method for generating the Transition 
Path Ensemble is Transition Path Sampling (TPS) 
TPS samples the TPE using a dynamic Markov Chain 
Monte Carlo algorithm. Here, a new path is generated by 
shooting off trajectories in the forwards and backwards 
directions from a point in the old path, after slightly 
changing its momentum co-ordinate. The new path is 
then accepted or rejected, usually via a Metropohs ac- 
ceptance/rejection criterion (which requires knowledge of 
the phase space density of the new initial point, mean- 
ing that TPS cannot be used for non-equilibrium sys- 
tems). The acceptance criterion is optimized by tuning 
the maximum momentum displacement. However, even 
with deterministic dynamics, the Lyapunov instability of 
the system is often so large that when the smallest mo- 
mentum displacements possible are used, the trial tra- 
jectories still diverge in a few picoseconds from the old 
ones; for stochastic dynamics, the situation is likely to be 
worse. TPS alleviates this problem by mainly shooting 
off trajectories near the top of the barrier; however, this 
drastically hampers the relaxation of the transition paths 
and, as a result, TPS is inefficient for transitions that take 



longer than a few picoseconds. The Lyapunov instability 
also explains why TPS cannot conveniently be adapted 
to simulate non-equilibrium systems by only shooting in 
the forward direction, in the manner of the methods de- 
scribed here: shooting in the forward direction from early 
points in the transition paths is very unlikely to succeed. 
The non-equilibrium scheme of Crooks and Chandler Q 
is also expected to suffer from trajectory divergence for 
multi-dimensional systems. The methods described in 
this paper suffer much less from these problems associ- 
ated with the Lyapunov instability. This is because trial 
runs which are fired from interface Xi are only required 
to reach A.^+i in order to make a contribution to the path 
ensemble. If the distance between interfaces were to be 
very large, the Lyapunov instability might lead to prob- 
lems in reaching A^+i, but in this case, the interfaces can 
simply be positioned more closely. These methods should 
therefore prove useful for studying diffusive rare events. 

The schemes presented here use the same formulation 
for the rate constant as the Transition Interface Sam- 
pling (TIS) method of Van Erp ei o/ 0, Q. In TIS, 
however, paths from A to interface are sampled us- 
ing the "shooting" methodology of TPS. Although TIS 
is generally more efficient than TPS for rate constant cal- 
culations, like TPS, it cannot be used for non-equilibrium 
systems, since knowledge of the phase space density is re- 
quired. TIS also suffers from the Lyapunov instability in 
the same way as TPS and is therefore only suitable for 
very short transition paths. 

Other schemes have also been proposed which use a 
series of interfaces between regions A and B, including 
Partial Path Transition Interface Sampling l3^ and 
Milestoning 1 1] . These methods assume that the points 
in the TPE at the interfaces are distributed according 
to the stationary phase space density - for example, the 
Boltzmann distribution. This allows them to be used for 
diffusive problems where transition paths are long. How- 
ever, this assumption is unjustified in many cases |l,ll |. 
even for equilibrium problems. Moreover, these methods 
cannot be used for non-equilibrium problems where the 
phase space density is unknown. 

For the methods described in this paper, the use of 
interfaces does not involve any assumptions about the 
transition mechanism, or about the transition paths. The 
role of the interfaces is simply to improve the efficiency of 
the sampling; they have no effect on the transition paths 
that are obtained. This is because the final points of the 
trial runs from interface i — 1 are used as the initial points 
of the trials from interface i, so that the correct dynamics 
of the system is preserved throughout the transition path. 
The points at the interfaces are not assumed to follow 
the steady state phase space distribution. In fact, for the 
genetic switch, we find that the distribution of points at 
the interfaces is very far from the steady state one [3| ■ 

It is interesting to make some general points on the 
nature of the path sampling in the different schemes dis- 
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cussed here. The FFS and BG methods proposed here 
are examples of static Monte Carlo schemes, in which 
new paths are generated independently of previous paths. 
The RB/M scheme could be interpreted as a dynamic 
Markov Chain Monte Carlo algorithm, since newly gen- 
erated paths are compared with previously generated 
ones. However, new trajectories are here generated from 
scratch, in contrast to most dynamic importance sam- 
pling algorithms (including TPS), where previous paths 
are used to generate new ones. The methods described 
here have the general advantage of static schemes that 
they are less likely to get stuck in particular regions of 
state space, a common problem in dynamic importance 
sampling schemes. 

In this paper, we have demonstrated that all three 
of the methods provide a dramatic improvement in ef- 
ficiency over brute force simulations, for calculations of 
the rate constant. For the problems studied here, the effi- 
ciency of all three methods was roughly of the same order 
of magnitude, with the RB method being slightly less ef- 
ficient. A much more detailed study of the efficiency of 
the methods, in which analytical expressions are derived 
for the computational cost of the three algorithms and 
for the statistical accuracy of the resulting estimates of 
the rate constant, will be presented in a future publi- 
cation This should allow systematic optimization 
of the choice of parameters, for particular methods ap- 
plied to particular problems. The choice of which method 
to use may depend not only on the computational effi- 
ciency with which the rate constant can be calculated, 
but also on practical issues such as the fact that the BG 
and RB methods require less storage of system config- 
urations than FFS. In cases where one is interested in 
analysing the TPE to obtain information on the transi- 
tion mechanism, the RB method may be preferable, since 
it generates unbranched transition paths in a convenient, 
one-at-a-time fashion. 

The methods described in this paper are only suitable 
for stochastic dynamical schemes, since they rely on the 
fact that many trials can be fired from one initial point. 
Many rare events are simulated using Molecular Dynam- 
ics, which is generally entirely deterministic. Is it possi- 
ble to use schemes of this type for Molecular Dynamics 
simulations? Our view is that it is indeed possible, if a 
weak Andersen thermostat is used as a noise gener- 
ator. This approach was used by Bolhuis to apply TPS 
to diffusive barrier crossings [3J|. As long as the noise 
source does not increase the timescale of the Lyapunov 
divergence, it is unlikely to disturb the dynamics of the 
system. Further investigation in this direction is planned. 

Finally, the methods as formulated here are suitable for 
non-equilibrium systems in stationary state i.e. systems 
where detailed balance is not obeyed, there are fluxes in 
phase space and the phase space density is not known, 
but nevertheless the average properties of the system are 
time- independent. These conditions apply to a wide class 



of systems which have not previously been accessible to 
rare event simulations. However, another very interesting 
class of non-equilibrium rare events occurs in systems 
that are not in stationary state - for example, systems 
with a time-dependent external driving force. In future 
work, we aim to investigate under what circumstances 
these methods can be used for time-dependent problems. 
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JUSTIFICATION OF THE ALGORITHMS 
Averaging of probabilities 

In this section, we comment on expressions and ^ 
for the rate constant Uab- Eq.(^ states that kAB is the 
time-averaged flux ^A,n of trajectories reaching A„, com- 
ing from A, per unit of time that the system spends in 
state hA = 1. This is then equal to the time-averaged 
flux $^^0 of trajectories crossing Aq for the first time 
since leaving A, multiplied by the probability P(A„|Ao) 
that any one of these trajectories will subsequently reach 
An = Ab, before returning to A. Eq.(|21l states that for a 
particular trajectory, P(A„|Ao) is equal to the probabil- 
ity of reaching Ai from Aq, then, given that Ai has been 
reached, of subsequently reaching A2, and so on. In the 
Branched Growth method, P(A„|Ao) is indeed estimated 
for individual trajectories: for a particular starting point 
at Ao, the product nr=o^ -^('^i+i l'^*) explicitly evalu- 
ated by creating a branching tree of paths and counting 
the number of branches that reach A„ . An average is then 
taken of this estimate over many such branching paths. 
In the Branched Growth method, therefore, we obtain 

n-l 

P(A„|Ao)sG = (H ^'(A^+i|A.))ao (14) 

where the notation ()ao denotes an average over all paths 
which begin from Aq. 

In the FFS and Rosenbluth methods, however, aver- 
ages are taken over the estimates of P(Ai+i|Ai) for each 
interface i, and these averages are multiplied: 

n-l 

P{Xn\\o)FFS/RB = X{{P (15) 
i=0 

In Eq. (|15|) . denotes an average over all paths which 
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extend from ^ to Ai . We now demonstrate that Eqs H14(l 
and IjlSfl are consistent. 

Beginning with Eq. (|14|l . we multiply and divide by 

(nr=o ^(a.+i|a.))ao: 



(p(A„iA„_i)xnr=o'm+iiA.))A, 



P(A„|Ao) 



BG 



(nr=o ^(AmiAO)Ao 



n-2 



x(n^(^'^+ii^^))Ao (16) 

1=0 

n-2 

(F(A„|A„_i)}a„_i X (J];p(A,+i|A,))ao 



since YI^^q P{K+i\^i) is the weighting factor for a par- 
ticular path that starts from Aq, in the ensemble of paths 
that connect Aq to A„_i. We now multiply and divide by 
(nro'^(A.+i|AO)Ao: 



n— 3 



P(A„|Ao)bg = (F(A„|A„_i))a„_i X ([|P(A,+i|A,;))ao 

i=0 

^ (P(A„,i|A„_2)nr=o'^(Ai+i|A.))Ao 

(n:ro'^(A.+iiA.))Ao 

= (F(A„|A„_i))a„_i X (P(A„_i|A„_2))a„_. 

n— 3 
1=0 

Extending this analysis, we arrive at the result that 

-P(A„|Ao)bG = P{^n\^o)FFS/RB- 

Weights of paths 

In this section, we show that all three methods sample 
the true transition path ensemble (TPE): i.e. that tran- 
sition paths are sampled with the correct weights. We 
define the TPE to be all paths that would be obtained in 
an infinitely long brute force simulation run, which obey 
the conditions that their first point is in A, their last 
point is in i3, and all other points lie between A and B. 
These paths can consist of any number N of simulation 
steps. The weight of a particular transition path in the 
TPE is: 



P{{x})^ C0{Xa ~ Hx„))p{x„) 



(17) 



^-2 



X Y\_ Pi.t+i(^{H^t+i) - Aa)^(Ab - A(xi+i)) 

1=0 

XpN^lM^iH^N) - As) 

where Pi^i+i = p{xi aj^+i), the probability of making 
a step from point Xi to point x^+i, Po{xq) is the steady- 
state phase space density of the first point in the path, 
which is in region A, and C is a normalization constant. 



The first term in Eq.^TJ is the phase space density of 
the initial point; the 0- function ensures that point xq lies 
in A. The next term is a product over all simulation 
steps i in the path, except the last point: pis+i is the 
probability of taking a particular simulation step and the 
(^-functions ensure that point Xi+i lies between Xa and 
As. The final 6'-function ensures that the last point in 
the transition path lies in the B region. 





FIG. 11: Illustration of the division of a path into partial 
paths, (a) A path which begins in A and reaches B. Points 
2-6 belong to the partial path Y-\, points 7-17 to Yq and 
points 18-42 to Y2. (b) A path which begins and ends in 
A. Partial paths are coded as follows: Y-\ open circles, Yq 
squares, Y\ triangles. 



We now divide the transition path into a series of par- 
tial paths. A partial path Yj , consisting of a successive set 
of points {y'f^ . . . is defined to be a part of a tra- 

jectory (the whole trajectory being {xq . . . xn}), which 
begins just after the trajectory crosses interface Xj for 
the first time, and ends just after it crosses either Aj+i 
or Xa- The first partial path is denoted Y-i- This be- 
gins just after the trajectory leaves A and ends just after 
it crosses Aq for the first time, or returns to A. Figure 
^Jillustrate the division of two different trajectories into 
partial paths. In Figure [TTk . for example, y[ — X2, 

(0) (0) (0) 

= ^5 - 2;6, - XT, y)^'^ = y\^ = 0:17, 

2^18 7 — ~ ^42- We also define a "suc- 
cess" function, ^[i^m], by 



m] = 1 

if partial path Yj ends at Aj+i and 



(18) 



(19) 



if partial path Yj instead ends in region A. For exam- 
pie, in Figure CBi, ^[y_i] = 1, ^^0 = 1 and ^[^1 - 1, 
whereas in Figure lllb . = I7 "^[^0] = 1 and 

i[Yi] =0. 

Denoting the initial point of the path yA = xq, we can 
now rewrite Ea. H17|l as: 



V{{x})= Ce{XA-X{yA))po{yA) 



(20) 



n n 



Pi.i+i 
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where the innermost product is now over all points in 
the transition path which belong to partial path Yj - the 
^-functions of Eq. H17|l are implicit in the definition of Yj . 
The factor ^[Yj] ensures that each partial path reaches 
the next interface. The final step xjv-i ^ a: at is included 
in partial path Yn-i. 



The FFS and BG methods 

The FFS and BG methods begin with a simulation 
run in the basin of attraction of A, from which points 
are collected immediately after the simulation crosses Aq. 
The probability distribution for the partial paths that 
connect A to Aq is: 



which were used to produce the paths. The accep- 
tance/rejection procedure therefore depends on all trial 
runs, not just the ones that are selected and form part of 
the transition path. In order to demonstrate the validity 
of the method, we consider the probability of generating 
and accepting a particular "decorated transition path" - 
by which we mean a transition path from A to B, to- 
gether with its kj — 1 attendant unselected trials for each 
interface j. 

Let us suppose that we have reached interface Xj in 
the RB path generation procedure. The probability of 
generating a particular trial run (or "trial partial path" ) 
Yj to Xj+i or Xa is: 



Pi,i+1 



(23) 



V-,i{x}) = C-i9{XA-XiyA))poiyA) 



ieY>' 



ieY^i -I where Di is a normalization constant and the product is 

(21) over all steps in the trial run Y''. Having generated a set 



Here, the ^-function ensures that the initial point i/a lies 
in region A. pqIi/a) is the steady-state phase space den- 
sity for point i/a- The product is over all the points in 
partial path which connects region A to Aq, and the 
factor ^[I'-i] ensures that the path reaches Aq rather than 
returning to A. Finally, C_i is a normalization constant. 
Having obtained the point at Ao , trial runs are then used 
to extend the transition path to subsequent interfaces. 
In FFS, points are selected at random from a collection 
at Aq, while in BG, fco trials are run from each point at 
Aq. However, this makes no difference to the probability 
distribution for the resulting paths that connect Xa to 
interface j, which is: 



of kj trial runs, the probability of selecting a particular 



one, Y* , to extend the chain to the next interface is: 



(24) 



where the index b runs over all the kj generated trial runs 
Yj". We now consider the generation of a new decorated 
transition path, consisting of a chain of partial paths Yj 
for —l<j<n. The probability of obtaining a particular 
path leading from A to Xq is, as in Ea. (|^ : 



V-ii{x}) = C-i9{XA-X{yA))poiyA) 



'Pjiix})^ C,e{XA-X{yA))po{yA) (22) 

X n n p 



m— — 1 



Once again, the inner product is over all points that form 
part of partial path Y^. Extending this analysis to the 
n-th interface, we obtain the result that the FFS and BG 
methods sample paths according to the correct distribu- 
tion function, given by Ea. H20() . 



The RB method 

We now turn to the Rosenbluth method, imple- 
mented with the Metropolis acceptance/rejection scheme 
(RB/M), as described in Section „ We show that this 
method generates paths with the correct weights, as given 
by Ea.()20|l. and we point out some differences between 
the RB/M method and the Rosenbluth procedure usually 
used for polymer sampling JJ. 

In the RB/M method, the Rosenbluth weights 
and wj°\ which are compared in the accep- 
tance/rejection step, depend on all the trial runs 



(25) 

We then shoot kj trial runs at each interface Xj, for 
< j < n. At each interface, we denote the trial run 
that is selected by an asterisk and the kj — 1 trials which 
are not selected by the index 6'. The probability of gen- 
erating a particular decorated transition path, consisting 
of selected trial paths Y* and unselected trial runs Yj 
is: 



p3-"{n) ^ C'e{Xo - X{yA))po{yA) 



n-l r 



X Yl P^^'''[Y*]P'^^[Y*] Y[ 

3=0 



b' = l 



(26) 



Having generated a decorated transition path (here de- 
noted n), we now compare its Rosenbluth factor to that 
of the last decorated transition path that was accepted 
(here denoted o). The probability P'"^'^(o n) of accep- 
tance obeys the relation: 



(n) 



(o) 



(27) 
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where 14^/"'' and W^"' are the Rosenbluth factors: 



n — 1 



j-0 b^l 



n — 1 ^ J 
j=Q b=l 

(28) 

where the index h runs over all (selected and unselected) 
trial runs at interface j which belong to the decorated 
transition path. 

The flow of probability during the path sampling pro- 
cedure from decorated path o to decorated path n is given 
by: 



/C(o n) = 7V(o)Pf™(n)P'^'='=(o ^ n) 



(29) 



where M{6) is the weight of the old augmented path in 
our ensemble. When our sampling reaches a steady state, 
detailed balance will be obeyed in the space of decorated 
paths: 

/C(o n) = /C(n ^ o) (30) 
Substituting Eqs (gSJ-® into we find that: 
N{n) _ e{\A-\{yA{n)))pa{vA{n)) 



N{o) 0(Aa-A(j/^(o)))po(2/a(o)) 
C[^-i(n)]n»eF_i(n) P^,i+i 

C[5^-l(o)] n»GF-i(o) P^Ml 



(31) 



, n;Co [y; {^miy* (n)] wIlI p'^^ i^;"' (n)] 



n;=o p^^-[Y*iomY*{o)] utu p«^"[i^'''(o)] 

from which we can conclude that a particular decorated 
path is sampled by the RB/M method with weight: 



Af{{x}) ^ 0{\a - X{yA))poiyA)aY-i] J] ^^M+i 

n-l kj-1 
3=0 



(32) 



b' = l 



where again, the index b' denotes unselected trial runs. 
We would now like to know the weight with which a 
particular undecorated transition path is sampled in the 
RB/M method. This weight is given by the sum of 
7V({x}), taken over all decorated paths which have iden- 
tical backbone chains: i.e. which represent identical tran- 
sition paths, decorated by different sets of unselected trial 
runs. We know, however, that 



fei-i 



E n p'^^iYf]^! 



(33) 



fc'=i 



where ^ denotes a sum over all possible combinations 
oi ki — 1 unselected trials from interface i. Taking this 
sum over the distribution function of Eq. 1)32(1 and sub- 
stituting in Eq. H23|l , we find that the Rosenbluth method 



indeed samples transition paths with the correct weight 
(HOI. 

The RB/M method described in this paper differs from 
the well-used Rosenbluth technique for polymer sampling 
|l||. There, the Rosenbluth factor of the newly gener- 
ated polymer configuration is not compared to that of 
the previously accepted configuration, but rather to that 
of a randomly chosen chain from the system. Moreover, 
the Rosenbluth factor of this chain must be re-calculated 
(by generating a new set of trial moves) when the chain 
is selected, rather than being stored when the configu- 
ration was first generated. This is necessary in the case 
of polymers because of the interactions between polymer 
chains, which depend on the current state of the system. 
The RB/M technique of Section„ which is much less com- 
putationally intensive, is appropriate for path sampling 
because of the absence of interactions between different 
transition paths. 



"WASTE- RECYCLING" 

In Section „ we described the implementation of the 
Rosenbluth path sampling scheme, with a Metropolis- 
like acceptance/rejection procedure for reweighting the 
paths. Correct reweighting can also be achieved using 
an alternative approach, in which ensemble averages are 
computed over all generated paths, taking explicit ac- 
count of their weights. This scheme, known as "Waste- 
Recycling" , was originally proposed by Frenkel , in a 
Monte-Carlo simulation context. 

Let us suppose that we wish to compute the average 
of a quantity X for paths in the TPE. We know that 
paths from ^ to _B which are generated by the Rosenbluth 
scheme should be weighted according to their Rosenbluth 
factors W = n"Jo^^^^^- We could, of course, simply 
compute 

where the index b refers to the individual paths which are 
generated and Q is the total number of generated paths. 
The problem with this is that, as the path sampling pro- 
ceeds, both the numerator and denominator of Eq. 1)34(1 
will increase in proportion to the number of paths sam- 
pled. At the end of a long sampling run, one will be faced 
with the problem of dividing two enormously large num- 
bers. The "Waste Recycling" scheme avoids this prob- 
lem. 

In order to use Waste Recycling to obtain the 
probabilities P(Ai+i|Ai), as well as {X)tpe for any 
chosen property X of the transition paths, the following 
procedure is used: 

(l)Choose a number Uc (typically, ric w 3 — 10). Define 
values X'^"™ — for all properties X of the TPE which 
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one wishes to compute. For each interface (0 < i < n), 
define values nn = 0, Ci — 1, p™™ — and arrays Wi 
and Pi, of size ric. Define an array of transition paths V, 
also of size ric- 

(2) Begin or continue a simulation run in the A basin. 
When the system leaves A and crosses Ao, suspend this 
run. Denote the system configuration as xq. Set i 0, 
Wo[co] = l. 

(3) Initiate ki trial runs from Xi. Continue each trial run 
until either A^+i or Xa is reached. Calculate the number 



Ns^'' of trials which reach A^+i. Set pi[ci] — Ns^' /ki and 

W^i+i[ci+i] = Wi[c^] * N^/\ Increment Ci + 1. 

(4) If Ci < 71c + 1, continue to step (5). Otherwise {i.e. if 

Ci = ric + 1) , increment rrii — > + 1 and 



(0 



Pi 



(35) 



Select one member of the array Wi with probability 
Pseiib*) = W,[b*]/J:ZlW^[b]. Set W,[l] = W^b*], 
Pt[l] = pt[b*] and = 2. 

(5) If Tvi*^ > 0, choose one successful trial at random. 
Denote the final point of this trial as Xi^i. Add the con- 
figurations corresponding to this trial run to the path 
Vlct]. Set j ^ i + 1. Otherwise (if Tvi'^ = 0), return to 
step (2). 

(6) Repeat steps (3)-(5) until i — n. 

(7) If i = n: Increment c„ ^ c„ + 1. 

(8) If c„ < Uc + 1: return to step (2). Otherwise {i.e. if 
Cn = ric + 1)'. increment to„ mn + 1 and 



Xc 



Xc 



E:UWn[b]X[P[b]] 
EZlWnib] 



(36) 



Select one member of the array Wn with probability 

Psel{b*) = W4b*]/j:'^^,Wn[b]. Set Wn[l] = W^[b% 

-pll] = rib*] and Cn = 2. 

(9) Repeat steps (2)-(8) many times. 

(10) For each interface < i < n, calculate 
P{Xi+i\Xi) =pf"™/mj. Calculate {X)tpe = ^c«m/m„. 

The key idea of Waste Recycling is that one generates 
paths in "groups" - each group having ric members. Once 
a group of ric paths has been generated, the quantity 
Wn[b]X[r[b]]) I {Y2U is added to the cu- 

mulative average for the property X. One member of the 
group is then selected with probability proportional to its 
Rosenbluth weight W , to become the first member of the 
subsequent group. The algorithm described above above 
also includes separate "grouping" procedures for every 
interface: the index a denotes the position of the partial 
path in the "group" connecting A to Ai and Wi[ci] de- 
notes the Rosenbluth factor of this partial path as given 
by Eq.Q. Once a group of partial paths connecting A to 
Xi contains ric members, the 

mented by {^'L^ VF,[6]p,[6]) / ^I'L^ Wi[6]) and one par- 
tial path is chosen with probability proportional to Wi 



to be the first member of the next group. This grouping 
procedure at each interface is necessary in order to cor- 
rectly evaluate P{Xi+\\Xi) using the partial path weights 
given by Eq. Q . 

In general. Waste Recycling can lead to very large in- 
creases in efficiency for Monte Carlo schemes in which 
a large set of possible moves (here paths) are gener- 
ated, after which only one is accepted. This is not the 
case for our Rosenbluth path sampling scheme, where 
paths are generated one at a time. We therefore ex- 
pect only a moderate, if any, increase in efficiency for the 
Waste Recycling scheme as compared to the Metropolis 
acceptance/rejection approach, for this particular appli- 
cation. In fact, as shown in Tables and IIIII the 
efficiency of the Waste Recycling and Metropolis accep- 
tance/rejection schemes are comparable for the two test 
cases investigated here. Nevertheless, we have described 
and tested the scheme for the sake of clarity, complete- 
ness and future reference. 



PRUNING 

For some problems, propagating trial paths from A^ 
back to Xa may be a major computational expense. In 
this case, computational efficiency could be enhanced 
using "pruning" - in analogy to the Pruned-Enriched 
Rosenbluth Method for polymer sampling [3, 0| • In the 
context of path sampling, this means that trial runs from 
Xi are not continued until they reach A^, but are rather 
terminated with probability Pp on reaching the preceding 
interface Ai„i. Surviving paths are re- weighted in order 
to maintain correct sampling of the TPE. We now dis- 
cuss briefly the implementation of the pruning procedure 
for the three methods, and show for the polymer translo- 
cation problem of Section b that the procedure leads to 
correct results for the rate constant. 



FFS 

The FFS algorithm proceeds as described in Section □ 
until an interface i is reached, such that Ai_i > A^^. Each 
of the Ni points in the collection at Xi is then assigned 
a weight /^*^ = 1. Selecting points at random, we carry 
out trial runs to A^+i. If a trial run arrives at Ai_i, it is 
terminated and counted as a "failure" {i.e. it is counted 
as if it had reached A^), with probability Pp~^^ . The 

fi— 1) 

run continues with probability 1 — Pp , and its weight 
is multiplied by 1/(1 — Pp^^). If it subsequently reaches 
Ai_2! it is terminated with probability Pp' and con- 
tinues with probability 1 — Pp* , with a weight which 
is now 1/[(1 — Pp~^)(l — Pp~^)]. This process is contin- 
ued until the trial run is terminated, it reaches A^ or it 
finally arrives at A^+i. The "number of successes", 
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is now given by the sum of the weights of aU successful 
trials from arriving at A^+i. On beginning the next 
trial run procedure, from Ai+i to we choose points 

from the collection at Ai+i with probability proportional 
to their weights /^'^. Each of the new trial runs then 
begins with weig ht = 1. After performing M,+i 

trials, the "number of successes" Nl^^ is the sum of the 
weights of all successful trials, and points at Ai+2 

are subsequently chosen according to their weights . 
Note that all trial runs begin with unit weight and not 
with the weight of their starting point in the collection 
at A7 . 



weights of the trials that eventually reached Ai+i. This 
affects the evaluation of the Rosenbluth weight of the par- 
tial path up to interface i: Wi = n}=i ^s^^ ■ This weight 
is compared with that of the previously accepted partial 
path up to interface i, and, if accepted, P^[^{Xi+i\Xi) 
becomes P,^|J„(A,+i |A,) = N^^ /h. If TV] > 0, then one 
of the successful trials is chosen with probability propor- 
tional to its weight f'^^\ The final point of this path be- 
comes the starting point for shooting trials to the next in- 
terface, each of which begins with unit weight = 1. 

Test of the pruning algorithms 



Branched growth 

In the Branched Growth method, as described in Sec- 
tion „ a branching "tree" of paths is created, in which ki 
trials are fired from each "parent" branch at interface i. 
Without pruning, the weight of each of the ki "daugh- 
ter" branches is the weight of the "parent" branch, mul- 
tiplied by \/ki. When pruning is included, these weights 
are modified. Suppose a trial run begins from inter- 
face i with weight h^^\ This weight h^^^ will be equal 
to 1/ n}=i 1 rnultiplied by any factors due to prun- 
ing events that have occured during the generation of 
the path from Xa to A^. Now suppose that this trial run 
does not proceed directly to A^+i, but rather goes back to 
Ai_i. It will then be terminated with probability Pp^^\ 
However, let us suppose that it survives (with probability 
1 — Pp ^^). Its weight now becomes h^^^ /{I — Pp ^•'). 
If it subsequently continues in the backward direction as 
far as Xi-2 and survives the pruning procedure there, its 



Pp ^■')], and so on. 



weight will be /i(*V[(l " -fp'~^^)(l 
Due to the pruning procedure, not all branches reaching 
a particular interface will have the same weight [in the 
absence of pruning, the weight of all branches reaching 
Ai is l/n}=i The final result for Pb is given by the 
sum of the weights of all branches that finally reach A b ■ 



Rosenbluth 

The Rosenbluth path sampling method is modified by 
pruning in a similar way to FFS. We focus here only 
on the Metropolis acceptance/rejection version of the 
method. Having generated a point at interface 1 using 
a free simulation in region A, we proceed as described 
in Section „ until we reach an interface i, such that 
Ai_i > Xa- We make ki trial runs from this interface. 
Each trial run begins with weight Z*^*' = 1. As for FFS, 
trial runs that reach Xi-s are terminated with probability 
Pp~^ and otherwise continue with weight multiplied 
by 1/(1 — Pp~^)- After the ki trials are completed, the 
"number of successes" Ns'"'^ is defined as the sum of the 



/ X 10^^ Pb X 10"^ kAB X 10^* N^t x 10* 

FFS 1.085 ± 0.004 1.38 ±0.02 1.50 ±0.02 4.1 

BG 1.081 ± 0.004 1.36 ±0.02 1.47 ±0.02 2.5 

Rb/M 1.091 ±0.003 1.32 ±0.02 1.44 ±0.03 4.1 

Rb/WR 1.082 ±0.003 1.31 ±0.03 1.42 ±0.03 8.2 



TABLE IV: FFS and brute force results for / = <^A,o/hA, 
P{Xn\Xo) and /cab, for the polymer translocation problem of 
Section □ with pruning probability Pp = 0.5 at all interfaces. 
Units of / and kAB are Da~^ . Errors represent the standard 
error in the mean of a series of independent estimates. Nst 
is the approximate number of simulation steps performed in 
arriving at the result given in the table. 

In order to demonstrate that the pruning procedure 
described above does lead to correct path sampling, we 
have repeated the polymer translocation calculations of 
section „ using a pruning probability Pp — 0.5 for all 
interfaces. This value for Pp was chosen arbitrarily. All 
parameters remained the same as those of Section □ the 
initial polymer parameter set was used. Table Hvl shows 
the results obtained: on comparison with Table ^ it is 
clear that the pruning procedure indeed leads to correct 
results. Comparing also the total number of simulation 
steps required to obtain the results of Table IIVI we find 
that no dramatic improvement in efficiency is achieved 
by using pruning for this system. For this reason, we did 
not attempt to optimise Pp. Nevertheless, pruning may 
be of use for other systems. 
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